74 research outputs found

    c-trie++: A Dynamic Trie Tailored for Fast Prefix Searches

    Full text link
    Given a dynamic set KK of kk strings of total length nn whose characters are drawn from an alphabet of size σ\sigma, a keyword dictionary is a data structure built on KK that provides locate, prefix search, and update operations on KK. Under the assumption that α=w/lgσ\alpha = w / \lg \sigma characters fit into a single machine word ww, we propose a keyword dictionary that represents KK in nlgσ+Θ(klgn)n \lg \sigma + \Theta(k \lg n) bits of space, supporting all operations in O(m/α+lgα)O(m / \alpha + \lg \alpha) expected time on an input string of length mm in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations

    Computing NP-Hard Repetitiveness Measures via MAX-SAT

    Get PDF
    Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors

    Efficient String Dictionary Compression Using String Dictionaries

    Get PDF
    文字列集合を保管するためのデータ構造である文字列辞書に関して,近年,多くの用途でコンパクト性が求められるという実例が報告されている.また,その背景に応じて,Trie や Front-Coding などの辞書を実現するための優れた技法に,Re-Pair などの強力な文書圧縮技法を組み合わせた圧縮文字列辞書が提案されている.本稿では,既存の圧縮文字列辞書の改良を目的とし,文字列辞書の圧縮に文字列辞書を用いるという方策に基づいた辞書構造を提案する.実データを用いた実験より,提案による文字列辞書はRe-Pair により圧縮した辞書と比べ,メモリ効率や検索・復元速度のトレードオフに関して同等の性能を示しつつ,短い時間で構築できることを示した.A string dictionary is a data structure to store a set of strings. Recently, instances have emerged in practice where the size of string dictionaries has become a critical problem in many applications. Consequently, compressed string dictionaries have been proposed by leveraging efficient implementation techniques, such as Trie and Front-Coding, and powerful text compression techniques, such as Re-Pair. In this paper, we propose new dictionary structures based on a strategy using string dictionaries for the compression in order to improve existing compressed ones. We show that our string dictionaries can be constructed in a shorter time compared to the Re-Pair versions with competitive space usage and operation speed, through experiments on real-world datasets

    Investigation of drugs for the prevention of doxorubicin-induced cardiac events using big data analysis

    Get PDF
    Aim: Doxorubicin, an anthracycline anti-tumour agent, is an essential chemotherapeutic drug; however, the adverse events associated with doxorubicin usage, including cardiotoxicity, prevent patients from continuing treatment. Here, we used databases to explore existing approved drugs with potential preventative effects against doxorubicin-induced cardiac events and examined their efficacy and mechanisms. Methods: The Gene Expression Omnibus (GEO), Library of Integrated Network-based Cellular Signatures (LINCS), and Food and Drug Administration Adverse Events Reporting System (FAERS) databases were used to extract candidate prophylactic drugs. Mouse models of doxorubicin-induced cardiac events were generated by intraperitoneal administration of 20 mg/kg of doxorubicin on Day 1 and oral administration of prophylactic candidate drugs for 6 consecutive days beginning the day before doxorubicin administration. On Day 6, mouse hearts were extracted and examined for mRNA expression of apoptosis-related genes. Results: GEO analysis showed that doxorubicin administration upregulated 490 genes and downregulated 862 genes, and LINCS data identified sirolimus, verapamil, minoxidil, prednisolone, guanabenz, and mosapride as drugs capable of counteracting these genetic alterations. Examination of the effects of these drugs on cardiac toxicity using FAERS identified sirolimus and mosapride as new prophylactic drug candidates. In model mice, mosapride and sirolimus suppressed the Bax/Bcl-2 mRNA ratio, which is elevated in doxorubicin-induced cardiotoxicity. These drugs also suppressed the expression of inflammatory cytokines Il1b and Il6 and markers associated with myocardial fibrosis, including Lgal3 and Timp1. Conclusion: These findings suggest that doxorubicin-induced cardiac events are suppressed by the administration of mosapride and sirolimus

    Chromobox 2 Expression Predicts Prognosis after Curative Resection of Oesophageal Squamous Cell Carcinoma

    Get PDF
    Background/Aim: To investigate the function of chromobox 2 (CBX2) in oesophageal squamous cell carcinoma (OSCC). Materials and Methods: We used real-time quantitative reverse transcription PCR (qRT-PCR) and immunohisto - chemistry to determine CBX2 expression levels in 13 human OSCC cell lines and clinical specimens of two independent cohorts of patients with OSCC. Results: PCR array analysis revealed that CBX2 was co-ordinately expressed with WNT5B in OSCC cell lines. RT-qPCR analysis of clinical samples revealed a high tumour-specific CBX2 expression compared with normal oesophageal tissues. High CBX2 expression was significantly associated with shorter disease-specific survival, hematogenous recurrence, and overall recurrence. Analysis of tissue microarrays of one cohort revealed that patients with higher CBX2 levels tended to have a shorter disease-specific survival. Conclusion: CBX2 overexpression in OSCC tissues may serve as a novel biomarker for predicting survival and hematogenous recurrence

    The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force

    Get PDF
    「コロナ制圧タスクフォース」COVID-19患者由来の血液細胞における遺伝子発現の網羅的解析 --重症度に応じた遺伝子発現の変化には、ヒトゲノム配列の個人差が影響する--. 京都大学プレスリリース. 2022-08-23.Coronavirus disease 2019 (COVID-19) is a recently-emerged infectious disease that has caused millions of deaths, where comprehensive understanding of disease mechanisms is still unestablished. In particular, studies of gene expression dynamics and regulation landscape in COVID-19 infected individuals are limited. Here, we report on a thorough analysis of whole blood RNA-seq data from 465 genotyped samples from the Japan COVID-19 Task Force, including 359 severe and 106 non-severe COVID-19 cases. We discover 1169 putative causal expression quantitative trait loci (eQTLs) including 34 possible colocalizations with biobank fine-mapping results of hematopoietic traits in a Japanese population, 1549 putative causal splice QTLs (sQTLs; e.g. two independent sQTLs at TOR1AIP1), as well as biologically interpretable trans-eQTL examples (e.g., REST and STING1), all fine-mapped at single variant resolution. We perform differential gene expression analysis to elucidate 198 genes with increased expression in severe COVID-19 cases and enriched for innate immune-related functions. Finally, we evaluate the limited but non-zero effect of COVID-19 phenotype on eQTL discovery, and highlight the presence of COVID-19 severity-interaction eQTLs (ieQTLs; e.g., CLEC4C and MYBL2). Our study provides a comprehensive catalog of whole blood regulatory variants in Japanese, as well as a reference for transcriptional landscapes in response to COVID-19 infection
    corecore